This notebook will get you started with downloading, exploring and analysing the input and output data of the challenge.
The proposed challenge will focus on crop type classification based on a time-series input of Sentinel-1, Sentinel-2 and Planet Fusion data. The challenge will cover two areas of interest, in Germany and South Africa, with high-quality cadastral data on field boundaries and crop types as ground truth input.
The challenge will consist of two tracks:
The participants will not be required to participate in both challenges. However, the evaluation mechanism behind both tracks are the same, as well as the rules and prize catalogue.
This notebook showcases how to download and process the data, but you are free to use any open souce Python library specifically designed to deal with Earth Observation data such as eo-learn. In this notebook, the data is stored as tif images, numpy arrays and geopandas dataframes to facilitate processing operations and torch is preferred for data processing and training. However, you can use any other Python tool of preference to process the provided data.
The notebook also showcases how to generate a valid submission file.
As per challenge rules, the following applies:
Code for the winning solutions will be reviewed to ensure rules have been followed.
The content of the notebook is as follows:
# ensure you have the required python packages
import sys
!{sys.executable} -m pip install -r requirements.txt
# Jupyter notebook related
%load_ext autoreload
%autoreload 2
%matplotlib inline
# Built-in modules
import os
import glob
import json
from typing import Tuple, List
from datetime import datetime, timedelta
import pickle
import shutil
import warnings
warnings.filterwarnings('ignore')
# Basics of Python data handling and visualization
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
from matplotlib.colors import ListedColormap
from tqdm.auto import tqdm
# Data reding for training validation purposes:
from utils.data_transform import PlanetTransform, Sentinel1Transform, Sentinel2Transform
from utils.planet_reader import PlanetReader
from utils.sentinel_1_reader import S1Reader
from utils.sentinel_2_reader import S2Reader
from utils.data_loader import DataLoader
from utils.baseline_models import SpatiotemporalModel
from utils import train_valid_eval_utils as tveu
from torch.optim import Adam
from torch.nn import CrossEntropyLoss
from torch.nn import NLLLoss
import torch
from torch.utils.tensorboard import SummaryWriter
from sklearn.metrics import confusion_matrix
In order to download the data through the provided APIs you would need to set-up an account for Radiant Earth Foundation. Check the following options.
TODO: WAITING INPUTS FROM RADIANT EARTH (e.g. scripts and credentials to download data into notebook/data directory of this notebook.)
TODO: WAITING INPUTS FROM RADIANT EARTH (e.g. scripts and credentials to download data into notebook/data directory of this notebook.)
Check the challenge description for alternative options to retrieve the data as a single zipped file.
This section gives an overview of all the necessary data to train your model and how to download them.
In general, the data sources to download are the following:
NOTE: It is not mandatory to exploit all of the data sources. The challenge participants are free to choose a subset or a combination of data sources to exploit for model training purposes.
The AOIs chosen for this challenge are in the Brandenburg State of Germany and the Republic of South Africa as detailed below.
Brandenburg-Germany Data
The Brandenburg-Germany data contains following time series in UTM zone 33N (i.e. epsg:32633) from Sentinel-1, Sentinel-2 and Planet Fusion:
crop_ids and crop_names based on cadastral data.crop_ids and crop_names which are reserved for the evaluation of challenge submissions. South Africa Data
The South-Africa data contains following time series in UTM zone 34S (i.e. epsg:32734) from Sentinel-1, Sentinel-2 and Planet Fusion:
For Training:
crop_ids and crop_names based on cadastral data.For Testing:
crop_ids and crop_names which are reserved for the evaluation of challenge submissions. In order to check the AOI for Brandenburg, you can start with the ground-truth files:
After your download, it is supposed to be placed in data folder of this notebook:
brandenburg_tr_labels_dir='data/brandenburg-gt/br-18E-242N-crop-labels-train-2018.geojson'
brandenburg_te_labels_dir='data/brandenburg-gt/br-17E-243N-crop-labels-test-2019.geojson'
In order to explore data types in the ground truth of 2018 (for training), you can run the following cell. It will summarize the geojson data belonging to the AOI, which have 2534 entries, and 5 columns representing field ID, area of the field in square meters, length of the field in meters, crop ID, crop name and the geomtry of the field as polygon:
#CHECK TARGET DATA FORMAT IN TRAINING GROUND-TRUTH POLYGONS:
brandenburg_tr_labels=gpd.read_file(brandenburg_tr_labels_dir)
print('INFO: Number of fields: {}\n'.format(len(brandenburg_tr_labels)))
brandenburg_tr_labels.info()
brandenburg_tr_labels.tail()
In order to explore data types in the ground truth of 2019 (for the evaluation of challenge), you can run the following cell. It will summarize the geojson data belonging to the AOI, which have 2064 entries, and 5 columns representing field ID, area of the field in square meters, length of the field in meters, crop ID, crop name and the geomtry of the field as polygon, as same with the example above. However the column crop_id has always value 0, and crop_name is only No Data, because they are reserved for the evaluation of the challenge. Participants of the challenge can only benefit from the geometry column to make predictions for each field bounded by a polygon:
#CHECK TARGET DATA FORMAT IN EVALUATION GROUND-TRUTH POLYGONS:
brandenburg_te_labels=gpd.read_file(brandenburg_te_labels_dir)
print('INFO: Number of fields: {}\n'.format(len(brandenburg_te_labels)))
brandenburg_te_labels.info()
brandenburg_te_labels.tail()
When you look at the crop labels and crop names in training ground-truths you are supposed to see 9 crop types with the following IDs:
#CHECK LABEL IDs AND LABEL NAMES:
label_ids=brandenburg_tr_labels['crop_id'].unique()
label_names=brandenburg_tr_labels['crop_name'].unique()
print('INFO: Label IDs: {}'.format(label_ids))
print('INFO: Label Names: {}'.format(label_names))
These plant types are not evenly planted in the agricultural fields, so you can observe the distribution of fields with each particular crop as below:
#CHECK FIELD DISTRIBUTION BY LABEL:
value_counts=brandenburg_tr_labels['crop_name'].value_counts()
colors_list = ['#78C850','#A8B820','#F8D030','#E0C068', '#F08030', '#C03028', '#F85888','#6890F0','#98D8D8']
ax=value_counts.plot.bar(color=colors_list)
ax.set_ylabel("Number of Fields")
ax.set_xlabel("Crop Types")
print('INFO: Number of Fields by Crop Type: \n{}'.format(value_counts))
However, if we look at the per hectare distribution of each crop, we will see a different distribution, because some crops seem usually planted in larger areas:
#CHECK TOTAL HECTARE DISTRIBUTION BY LABEL:
hectare_distribution = pd.DataFrame(columns=["crop_name", "total_hectare"])
for name, group in brandenburg_tr_labels.groupby('crop_name'):
total_hectare=group['SHAPE_AREA'].sum()/10000 # convert to m2 to hectare
hectare_distribution=hectare_distribution.append({'crop_id':group.iloc[0]['crop_id'], 'crop_name':name, 'total_hectare':total_hectare}, ignore_index=True)
hectare_distribution.set_index('crop_id', inplace=True)
colors_list = ['#78C850','#A8B820','#F8D030','#E0C068', '#F08030', '#C03028', '#F85888','#6890F0','#98D8D8']
ax=hectare_distribution.plot.barh(color=colors_list,x='crop_name', y='total_hectare',legend=False)
ax.set_xlabel("Total Hectare per Crop Type")
ax.set_ylabel("Crop Types")
print('INFO: Total Hectare per Crop Type: \n{}'.format(hectare_distribution.sort_index()))
Moreover, if we look at how fragmented the fields for a crop type, by counting the number of fields in different hectare bins:
#CHECK HECTARE DISTRIBUTION HISTOGRAM BY LABEL:
#Convert m2 to hectare:
histogram_data = brandenburg_tr_labels.copy(deep=True)
histogram_data['SHAPE_AREA']=brandenburg_tr_labels['SHAPE_AREA']/10000
ax=histogram_data.hist( by='crop_name',column = 'SHAPE_AREA', bins=25,figsize=(16,16))
for i in range(ax.shape[0]):
for j in range(ax.shape[1]):
ax[i][j].set_ylabel("Number of fields with the given hectare size")
ax[i][j].set_xlabel("Hectare")
For Brandenburg, the ground-truths of year 2018 are from tile 18E-242N as can be observed below:
#DISPLAY TARGET FIELDS of 2018 FOR TRAINING ON THE MAP BY LABEL:
fig, ax = plt.subplots(figsize=(18, 18))
counter=0
legend_elements = []
for name, group in brandenburg_tr_labels.groupby('crop_name'):
group.plot(ax=ax,color=colors_list[counter], aspect=1)
legend_elements.append(Patch(facecolor=colors_list[counter], edgecolor=colors_list[counter],label=name))
counter+=1
ax.legend(handles=legend_elements,loc='lower right')
ax.title.set_text('SOUTH AFRICA 2017: GROUND TRUTH POLYGONS WITH CROP LABELS for TRAINING')
For Brandenburg, the ground-truths of year 2019 are from tile 18E-242N as can be observed below, however as you can notice, only polygons are given to you without crop names and crop IDs, because they are reserved for the evaluation of the challenge.
#DISPLAY TARGET FIELDS of 2019 WITHOUT LABELS :
brandenburg_te=gpd.read_file(brandenburg_te_labels_dir)
fig, ax = plt.subplots(figsize=(18, 18))
counter=0
legend_elements = []
for name, group in brandenburg_te.groupby('crop_name'):
group.plot(ax=ax,color=colors_list[counter], aspect=1)
legend_elements.append(Patch(facecolor=colors_list[counter], edgecolor=colors_list[counter],label=name))
counter+=1
ax.legend(handles=legend_elements,loc='lower right')
ax.title.set_text('BRANDENBURG 2019: GROUND TRUTH POLYGONS WITHOUT CROP LABELS for the EVALUATION')
The input images are from Sentinel-1, Sentinel-2 and Planet Fusion, with following details:
[blue, green, red, NIR] channels in order, with 3-meter resolution and the provided data is in TIFF imaging format. For more details, you can refer to the specifications of PLANETSCOPE. [VV, VH, ANGLE] where V and H stand for vertical and horizontal orientations, respectively, and ANGLE stores the angle of observation to the earth surface as described here. The data is collected in Interferometric Wide (IW) swath mode and it includes both ascending and descending orbit directions. For further information about Sentinel-1 imagery, please refer to Sentinel Hub. [B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12]. The bands that have original spatial resolution of 20m and 60m are interpolated with a nearest-neighbour method to a 10m resolution. More information about the interpolation process here. Moreover, the cloud probability mask CLP is also provided, for more details please refer to Sentinel Hub.#DIRECTORY OF PLANET FUSION TRAINING DATA AND GROUND TRUTHS:
brandenburg_planet_train_dir='data/planet/UTM-24000/33N/18E-242N/PF-SR/'
brandenburg_tr_labels_dir='data/brandenburg-gt/br-18E-242N-crop-labels-train-2018.geojson'
#INITIALIZE THE DATA READER TO OBSERVE THE FIELDS FROM PLANET DATA:
# Choose some days of the year to plot
selected_days_of_year = [10,20,30,40, 50] #from 365 days of the year
#Initialize data reader for planet images
planet_reader = PlanetReader(input_dir=brandenburg_planet_train_dir,
label_dir=brandenburg_tr_labels_dir,
selected_time_points=selected_days_of_year)
#DEFINE TRUE COLOR IMAGING AND NDVI INDEXING FUNCTIONS FOR VISUALISATION OF PLANET DATA:
#Define NDVI index for Planet Fusion images
def ndvi(X):
red = X[2]
nir = X[3]
return (nir-red) / (nir + red)
#Define True Color for Planet Fusion images
def true_color(X):
blue = X[0]/(X[0].max()/255.0)
green = X[1]/(X[1].max()/255.0)
red = X[2]/(X[2].max()/255.0)
tc = np.dstack((red,green,blue))
return tc.astype('uint8')
#VISUALISE SOME OF THE FIELDS FROM PLANET DATA:
#Initialize plot cells
num_row = 2 * len(selected_days_of_year)
num_col = len(label_ids)
fig, axes = plt.subplots(num_row, num_col, figsize=(2*num_col,2*num_row))
#Display one sample field for each crop type
pbar = tqdm(total=len(label_ids))
iterable=iter(planet_reader)
for crop_id, crop_name in zip(label_ids,label_names):
while True:
# read a field sample
X,y,mask,_ = next(iterable)
width=X.shape[-1]
height=X.shape[-2]
#Get one sample for each crop type, and
#consider large areas (at least 200x200) to display
if y == crop_id and width>200 and height>200:
for i, day in enumerate(selected_days_of_year):
# Display RGB image of the field in a given week for a given crop type
ax = axes[(2*i)%num_row, crop_id%num_col]
ax.title.set_text('{}'.format(crop_name))
ax.set_ylabel('RGB in Day {}'.format(day))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(true_color(X[i]))
# Display NDVI index of the field in a given day for a given crop type
ax = axes[(2*i+1)%num_row, crop_id%num_col]
ax.title.set_text('{}'.format(crop_name))
ax.set_ylabel('NDVI in Day {}'.format(day))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(ndvi(X[i]*mask),cmap=plt.cm.summer)
#if one sample selected for a crop type, break the WHILE loop
pbar.set_description("Plotting {} Fields".format(crop_name))
pbar.update(1)
break
plt.tight_layout()
plt.show()
pbar.set_description("Plotting Complete!")
pbar.close()
#DIRECTORY OF SENTINEL-1 TRAINING DATA :
brandenburg_s1_asc_train_dir = "data/sentinel-1/s1-asc-utm-33N-18E-242N-2018.zip" #ASCENDING ORBIT
brandenburg_s1_dsc_train_dir = "data/sentinel-1/s1-dsc-utm-33N-18E-242N-2018.zip" #DESCENDING ORBIT
In your training and testing, you can utilize both ascending and descending orbit data. However, in this notebook, we will only demonstrate ascending orbit data. Changing the input_dir of S1Reader in the following cell is sufficient to explore descending orbit data.
#INITIALIZE THE DATA READER TO OBSERVE THE FIELDS FROM S1 DATA:
# Choose some days of the year to plot
selected_data_indices = [10,20,30,40,50] #beware that S1 data is not daily,
#Initialize data reader for planet images
s1_reader = S1Reader(input_dir=brandenburg_s1_asc_train_dir,
label_dir=brandenburg_tr_labels_dir,
selected_time_points=selected_data_indices)
# DEFINE RADAR VEGETATION INDEXING FOR VISUALISATION OF S1 DATA:
# for the algorithm please refer to Sentinel Hub:
# https://custom-scripts.sentinel-hub.com/custom-scripts/sentinel-1/radar_vegetation_index_code_dual_polarimetric/
def rvi(X):
VV = X[0]
VH = X[1]
dop = (VV/(VV+VH))
m = 1 - dop
radar_vegetation_index = (np.sqrt(dop))*((4*(VH))/(VV+VH))
return radar_vegetation_index
#VISUALISE SOME OF THE FIELDS FROM S1 DATA:
#Initialize plot cells
num_row = len(selected_data_indices)
num_col = len(label_ids)
fig, axes = plt.subplots(num_row, num_col, figsize=(2*num_col,2*num_row))
#Display one sample field for each crop type
pbar = tqdm(total=len(label_ids))
iterable=iter(s1_reader)
for crop_id, crop_name in zip(label_ids,label_names):
while True:
# read a field sample
X,y,mask,_ = next(iterable)
width=X.shape[-1]
height=X.shape[-2]
#Get one sample for each crop type, and
#consider large areas (at least 60x60) to display
if y == crop_id and width>60 and height>60:
for i, day in enumerate(selected_data_indices):
# Display RVI index of the field in a given day for a given crop type
ax = axes[i%num_row, crop_id%num_col]
ax.title.set_text('{}'.format(crop_name))
ax.set_ylabel('RVI in Day {}'.format(day))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(rvi(X[i]*mask),cmap=plt.cm.summer)
#if one sample selected for a crop type, break the WHILE loop
pbar.set_description("Plotting {} Fields".format(crop_name))
pbar.update(1)
break
plt.tight_layout()
plt.show()
pbar.set_description("Plotting Complete!")
pbar.close()
Sentinel-2 (S2) can be initialized and called as similar to Sentinel-1 as demonstrated above. The only difference, you need to change the data reader from S1Reader to S2Reader and change the data links accordingly as shown below. If you get an error during the initialization of the S2Reader it might be due to unsufficient memory in your working environment because Sentinel-2 data at data/sentinel-2/s2-utm-33N-18E-242N-2018.zip is about 12GB:
#DIRECTORY OF SENTINEL-2 TRAINING DATA :
brandenburg_s2_train_dir = "data/sentinel-2/s2-utm-33N-18E-242N-2018.zip"
#INITIALIZE THE DATA READER TO OBSERVE THE FIELDS FROM PLANET DATA:
# Choose some days of the year to plot
selected_data_indices = [10,20,30,40,50] #beware that S1 data is not daily,
#Initialize data reader for planet images
s2_reader = S2Reader(input_dir=brandenburg_s2_train_dir,
label_dir=brandenburg_tr_labels_dir,
selected_time_points=selected_data_indices)
In order to check the AOI for South Africa, you can start with the ground-truth files:
After your download, it is supposed to be placed in data folder of this notebook:
south_africa_tr_labels_dir_1='data/south-africa-gt/sa-19E-258N-crop-labels-train-2017.geojson'
south_africa_tr_labels_dir_2='data/south-africa-gt/sa-19E-259N-crop-labels-train-2017.geojson'
south_africa_te_labels_dir ='data/south-africa-gt/sa-20E-259N-crop-labels-test-2017.geojson'
In order to explore data types in the ground truth at 19E-258N and 19E-259N (for training), you can run the following cell. It will summarize the geojson data belonging to the AOI, which have 1715 entries in the first ground-truth and 2436 entries in the second ground-truth, and 5 columns representing field ID, area of the field in square meters, length of the field in meters, crop ID, crop name and the geomtry of the field as polygon:
#CHECK TARGET DATA FORMAT IN TRAINING GROUND-TRUTH POLYGONS:
south_africa_tr_labels_1=gpd.read_file(south_africa_tr_labels_dir_1)
print('INFO: Number of fields: {}\n'.format(len(south_africa_tr_labels_1)))
south_africa_tr_labels_1.info()
south_africa_tr_labels_1.tail()
INFO: Number of fields: 1715 <class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 1715 entries, 0 to 1714 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 fid 1715 non-null int64 1 SHAPE_AREA 1715 non-null float64 2 SHAPE_LEN 1715 non-null float64 3 crop_id 1715 non-null int64 4 crop_name 1715 non-null object 5 geometry 1715 non-null geometry dtypes: float64(2), geometry(1), int64(2), object(1) memory usage: 80.5+ KB
| fid | SHAPE_AREA | SHAPE_LEN | crop_id | crop_name | geometry | |
|---|---|---|---|---|---|---|
| 1710 | 272638 | 237553.921793 | 2062.209147 | 2 | Barley | MULTIPOLYGON (((479711.541 6214292.977, 479705... |
| 1711 | 272639 | 142707.098365 | 2524.575182 | 1 | Wheat | MULTIPOLYGON (((477055.645 6199524.810, 477037... |
| 1712 | 272759 | 14774.502277 | 1020.000876 | 5 | Small grain grazing | MULTIPOLYGON (((465387.091 6197134.178, 465380... |
| 1713 | 272760 | 14323.177324 | 704.568254 | 4 | Lucerne/Medics | MULTIPOLYGON (((463318.045 6200831.958, 463306... |
| 1714 | 272762 | 106682.754574 | 1482.252585 | 4 | Lucerne/Medics | MULTIPOLYGON (((462690.755 6200783.963, 462469... |
#CHECK TARGET DATA FORMAT IN TRAINING GROUND-TRUTH POLYGONS:
south_africa_tr_labels_2=gpd.read_file(south_africa_tr_labels_dir_2)
print('INFO: Number of fields: {}\n'.format(len(south_africa_tr_labels_2)))
south_africa_tr_labels_2.info()
south_africa_tr_labels_2.tail()
INFO: Number of fields: 2436 <class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 2436 entries, 0 to 2435 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 fid 2436 non-null int64 1 SHAPE_AREA 2436 non-null float64 2 SHAPE_LEN 2436 non-null float64 3 crop_id 2436 non-null int64 4 crop_name 2436 non-null object 5 geometry 2436 non-null geometry dtypes: float64(2), geometry(1), int64(2), object(1) memory usage: 114.3+ KB
| fid | SHAPE_AREA | SHAPE_LEN | crop_id | crop_name | geometry | |
|---|---|---|---|---|---|---|
| 2431 | 270980 | 103201.583620 | 1167.637967 | 4 | Lucerne/Medics | MULTIPOLYGON (((456615.950 6227562.739, 456606... |
| 2432 | 271186 | 29028.409640 | 1339.613960 | 4 | Lucerne/Medics | MULTIPOLYGON (((456726.354 6227938.500, 456730... |
| 2433 | 271187 | 1756.641790 | 186.496241 | 4 | Lucerne/Medics | MULTIPOLYGON (((456476.993 6227877.458, 456469... |
| 2434 | 271188 | 6319.516614 | 496.304342 | 4 | Lucerne/Medics | MULTIPOLYGON (((456496.040 6227596.795, 456502... |
| 2435 | 271189 | 5078.729177 | 432.672236 | 4 | Lucerne/Medics | MULTIPOLYGON (((456699.919 6227590.255, 456687... |
In order to explore data types in the ground truth at 20E-259N (for the evaluation of challenge), you can run the following cell. It will summarize the geojson data belonging to the AOI, which have 2417 entries, and 5 columns representing field ID, area of the field in square meters, length of the field in meters, crop ID, crop name and the geomtry of the field as polygon, as same with the example above. However the column crop_id has always value 0, and crop_name is only No Data, because they are reserved for the evaluation of the challenge. Participants of the challenge can only benefit from the geometry column to make predictions for each field bounded by a polygon:
#CHECK TARGET DATA FORMAT IN EVALUATION GROUND-TRUTH POLYGONS:
south_africa_te_labels=gpd.read_file(south_africa_te_labels_dir)
print('INFO: Number of fields: {}\n'.format(len(south_africa_te_labels)))
south_africa_te_labels.info()
south_africa_te_labels.tail()
INFO: Number of fields: 2417 <class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 2417 entries, 0 to 2416 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 fid 2417 non-null int64 1 SHAPE_AREA 2417 non-null float64 2 SHAPE_LEN 2417 non-null float64 3 crop_id 2417 non-null int64 4 crop_name 2417 non-null object 5 geometry 2417 non-null geometry dtypes: float64(2), geometry(1), int64(2), object(1) memory usage: 113.4+ KB
| fid | SHAPE_AREA | SHAPE_LEN | crop_id | crop_name | geometry | |
|---|---|---|---|---|---|---|
| 2412 | 272354 | 126740.775270 | 1459.240753 | 0 | No Data | MULTIPOLYGON (((494072.643 6234366.595, 493923... |
| 2413 | 272355 | 144071.881716 | 1562.941091 | 0 | No Data | MULTIPOLYGON (((497143.580 6229113.759, 497128... |
| 2414 | 272429 | 44331.637211 | 836.115387 | 0 | No Data | MULTIPOLYGON (((493125.600 6234793.677, 493111... |
| 2415 | 272640 | 345984.933132 | 2934.752582 | 0 | No Data | MULTIPOLYGON (((487221.594 6224539.396, 487737... |
| 2416 | 272740 | 7384.029432 | 348.852724 | 0 | No Data | MULTIPOLYGON (((485258.535 6221836.838, 485196... |
When you look at the crop labels and crop names in training ground-truths you are supposed to see 5 crop types with the following IDs:
#CHECK LABEL IDs AND LABEL NAMES:
label_ids=south_africa_tr_labels_2['crop_id'].unique()
label_names=south_africa_tr_labels_2['crop_name'].unique()
print('INFO: Label IDs: {}'.format(label_ids))
print('INFO: Label Names: {}'.format(label_names))
INFO: Label IDs: [4 2 5 1 3] INFO: Label Names: ['Lucerne/Medics' 'Barley' 'Small grain grazing' 'Wheat' 'Canola']
These plant types are not evenly planted in the agricultural fields, so you can observe the distribution of fields with each particular crop as below:
#CHECK FIELD DISTRIBUTION BY LABEL:
value_counts =south_africa_tr_labels_1['crop_name'].value_counts()
value_counts += south_africa_tr_labels_2['crop_name'].value_counts()
colors_list = ['#78C850','#A8B820','#F8D030','#E0C068', '#F08030', '#C03028', '#F85888','#6890F0','#98D8D8']
ax=value_counts.plot.bar(color=colors_list)
ax.set_ylabel("Number of Fields")
ax.set_xlabel("Crop Types")
print('INFO: Number of Fields by Crop Type: \n{}'.format(value_counts))
INFO: Number of Fields by Crop Type: Lucerne/Medics 1792 Wheat 753 Canola 512 Barley 661 Small grain grazing 433 Name: crop_name, dtype: int64
However, if we look at the per hectare distribution of each crop, we will see a different distribution, because some crops seem usually planted in larger areas:
#CHECK TOTAL HECTARE DISTRIBUTION BY LABEL:
hectare_distribution = pd.DataFrame(columns=["crop_name", "total_hectare"])
for group_1, group_2 in zip(south_africa_tr_labels_1.groupby('crop_name'),
south_africa_tr_labels_2.groupby('crop_name')):
crop_id=group_1[1].iloc[0]['crop_id']
crop_name=group_1[0]
total_hectare_1=group_1[1]['SHAPE_AREA'].sum()/10000 # convert to m2 to hectare
total_hectare_2=group_2[1]['SHAPE_AREA'].sum()/10000 # convert to m2 to hectare
total_hectare= total_hectare_1 + total_hectare_2
hectare_distribution=hectare_distribution.append({'crop_id':crop_id, 'crop_name':crop_name, 'total_hectare':total_hectare}, ignore_index=True)
hectare_distribution.set_index('crop_id', inplace=True)
colors_list = ['#78C850','#A8B820','#F8D030','#E0C068', '#F08030', '#C03028', '#F85888','#6890F0','#98D8D8']
ax=hectare_distribution.plot.barh(color=colors_list,x='crop_name', y='total_hectare',legend=False)
ax.set_xlabel("Total Hectare per Crop Type")
ax.set_ylabel("Crop Types")
print('INFO: Total Hectare per Crop Type: \n{}'.format(hectare_distribution.sort_index()))
INFO: Total Hectare per Crop Type:
crop_name total_hectare
crop_id
1.0 Wheat 14060.633983
2.0 Barley 11491.487593
3.0 Canola 9889.008658
4.0 Lucerne/Medics 20795.063533
5.0 Small grain grazing 6013.529840
For South Africa, the ground-truths at tiles 19E-258N and 19E-259N as can be observed below:
#DISPLAY TARGET FIELDS AT TILES '19E-258N' and '19E-259N' FOR TRAINING ON THE MAP BY LABEL:
fig, axes = plt.subplots(1,2, figsize=(18, 18))
counter=0
legend_elements = []
for group_1, group_2 in zip(south_africa_tr_labels_1.groupby('crop_name'),
south_africa_tr_labels_2.groupby('crop_name')):
group_1[1].plot(ax=axes[0],color=colors_list[counter], aspect=1)
group_2[1].plot(ax=axes[1],color=colors_list[counter], aspect=1)
legend_elements.append(Patch(facecolor=colors_list[counter], edgecolor=colors_list[counter],label=group_1[0]))
counter+=1
axes[0].legend(handles=legend_elements,loc='lower right')
axes[0].title.set_text('SOUTH AFRICA 2017: GROUND TRUTH POLYGONS AT 19E-258N WITH CROP LABELS')
axes[1].legend(handles=legend_elements,loc='lower right')
axes[1].title.set_text('SOUTH AFRICA 2017: GROUND TRUTH POLYGONS AT 19E-259N WITH CROP LABELS')
For South Africa, the ground-truths at tile 20E-259N as can be observed below, however as you can notice, only polygons are given to you without crop names and crop IDs, because they are reserved for the evaluation of challenge.
#DISPLAY TARGET FIELDS AT TILES '20E-259N' WITHOUT LABELS :
south_africa_te_labels=gpd.read_file(south_africa_te_labels_dir)
fig, ax = plt.subplots(figsize=(18, 18))
counter=0
legend_elements = []
for name, group in south_africa_te_labels.groupby('crop_name'):
group.plot(ax=ax,color=colors_list[counter], aspect=1)
legend_elements.append(Patch(facecolor=colors_list[counter], edgecolor=colors_list[counter],label=name))
counter+=1
ax.legend(handles=legend_elements,loc='lower right')
ax.title.set_text('SOUTH AFRICA 2017: GROUND TRUTH POLYGONS AT 20E-259N WITHOUT CROP LABELS')
The input images are from Sentinel-1, Sentinel-2 and Planet Fusion, with following details:
[blue, green, red, NIR] channels in order, with 3-meter resolution and the provided data is in TIFF imaging format. For more details, you can refer to the specifications of PLANETSCOPE. [VV, VH, ANGLE] where V and H stand for vertical and horizontal orientations, respectively, and ANGLE stores the angle of observation to the earth surface as described here. The data is collected in Interferometric Wide (IW) swath mode and it includes both ascending and descending orbit directions. For further information about Sentinel-1 imagery, please refer to Sentinel Hub. [B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12]. The bands that have original spatial resolution of 20m and 60m are interpolated with a nearest-neighbour method to a 10m resolution. More information about the interpolation process here. Moreover, the cloud probability mask CLP is also provided, for more details please refer to Sentinel Hub.#DIRECTORY OF PLANET FUSION TRAINING DATA AND LABELS:
south_africa_planet_train_dir_1='data/planet/UTM-24000/34S/19E-258N/PF-SR/'
south_africa_tr_labels_dir_1='data/south-africa-gt/sa-19E-258N-crop-labels-train-2017.geojson'
south_africa_planet_train_dir_2='data/planet/UTM-24000/34S/19E-259N/PF-SR/'
south_africa_tr_labels_dir_2='data/south-africa-gt/sa-19E-259N-crop-labels-train-2017.geojson'
#INITIALIZE THE DATA READER TO OBSERVE THE FIELDS FROM PLANET DATA:
# Choose some days of the year to plot
selected_days_of_year = [1, 2, 3, 4, 5] #from 244 days of the year
#Initialize data reader for planet images
planet_reader = PlanetReader(input_dir=south_africa_planet_train_dir_1,
label_dir=south_africa_tr_labels_dir_1,
selected_time_points=selected_days_of_year)
INFO: Coordinate system of the data is: EPSG:32734 INFO: Ignoring 3/1715 fields with area < 1000m2
INFO: Extracting time series into the folder: /local_home/kuzu_ri/GIT_REPO/starter-pack-ai4food-v0.0.1/notebook/data/planet/UTM-24000/34S/19E-258N/PF-SR/time_series: 100%|██████████| 1712/1712 [03:07<00:00, 9.14it/s]
#VISUALISE SOME OF THE FIELDS FROM PLANET DATA:
#Initialize plot cells
num_row = 2 * len(selected_days_of_year)
num_col = len(label_ids)
fig, axes = plt.subplots(num_row, num_col, figsize=(2.5*num_col,2*num_row))
#Display one sample field for each crop type
pbar = tqdm(total=len(label_ids))
iterable=iter(planet_reader)
for crop_id, crop_name in zip(label_ids,label_names):
while True:
# read a field sample
X,y,mask,fid = next(iterable)
width=X.shape[-1]
height=X.shape[-2]
#Get one sample for each crop type, and
#consider large areas (at least 200x200) to display
if y == crop_id and width>200 and height>200:
for i, day in enumerate(selected_days_of_year):
# Display RGB image of the field in a given week for a given crop type
ax = axes[(2*i)%num_row, crop_id%num_col]
ax.title.set_text('{}'.format(crop_name))
ax.set_ylabel('RGB in Day {}'.format(day))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(true_color(X[i]))
# Display NDVI index of the field in a given day for a given crop type
ax = axes[(2*i+1)%num_row, crop_id%num_col]
ax.title.set_text('{}'.format(crop_name))
ax.set_ylabel('NDVI in Day {}'.format(day))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(ndvi(X[i]*mask),cmap=plt.cm.summer)
#if one sample selected for a crop type, break the WHILE loop
pbar.set_description("Plotting {} Fields".format(crop_name))
pbar.update(1)
break
plt.tight_layout()
plt.show()
pbar.set_description("Plotting Complete!")
pbar.close()
#DIRECTORY OF SENTINEL-1 TRAINING DATA:
south_africa_s1_asc_train_dir_1 = "data/sentinel-1/s1-asc-utm-34S-19E-258N-2017.zip"
south_africa_s1_asc_train_dir_2 = "data/sentinel-1/s1-asc-utm-34S-19E-259N-2017.zip"
#INITIALIZE THE DATA READER TO OBSERVE THE FIELDS FROM S1 DATA:
# Choose some days of the year to plot
selected_data_indices = [10,15,20,25,30] #beware that S1 data is not daily,
#Initialize data reader for planet images
s1_reader = S1Reader(input_dir=south_africa_s1_asc_train_dir_1,
label_dir=south_africa_tr_labels_dir_1,
selected_time_points=selected_data_indices)
INFO: Found folder in data/sentinel-1/s1-asc-utm-34S-19E-258N-2017, no need to unzip INFO: Ignoring 3/1715 fields with area < 1000m2
INFO: Extracting time series into the folder: data/sentinel-1/s1-asc-utm-34S-19E-258N-2017/time_series: 100%|██████████| 1712/1712 [00:00<00:00, 6350.32it/s]
#VISUALISE SOME OF THE FIELDS FROM S1 DATA:
#Initialize plot cells
num_row = len(selected_data_indices)
num_col = len(label_ids)
fig, axes = plt.subplots(num_row, num_col, figsize=(2*num_col,2*num_row))
#Display one sample field for each crop type
pbar = tqdm(total=len(label_ids))
iterable=iter(s1_reader)
for crop_id, crop_name in zip(label_ids,label_names):
while True:
# read a field sample
X,y,mask,fid = next(iterable)
width=X.shape[-1]
height=X.shape[-2]
#Get one sample for each crop type, and
#consider large areas (at least 60x60) to display
if y == crop_id and width>60 and height>60:
for i, day in enumerate(selected_data_indices):
# Display RVI index of the field in a given day for a given crop type
ax = axes[i%num_row, crop_id%num_col]
ax.title.set_text('{}'.format(crop_name))
ax.set_ylabel('RVI in Day {}'.format(day))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(rvi(X[i]*mask),cmap=plt.cm.summer)
#if one sample selected for a crop type, break the WHILE loop
pbar.set_description("Plotting {} Fields".format(crop_name))
pbar.update(1)
break
plt.tight_layout()
plt.show()
pbar.set_description("Plotting Complete!")
pbar.close()
#DIRECTORY OF SENTINEL-2 TRAINING DATA:
south_africa_s2_train_dir_1 = "data/sentinel-2/s2-utm-34S-19E-258N-2017.zip"
south_africa_s2_train_dir_2 = "data/sentinel-2/s2-utm-34S-19E-259N-2017.zip"
#INITIALIZE THE DATA READER TO OBSERVE THE FIELDS FROM S2 DATA:
# Choose some days of the year to plot
selected_data_indices = [12,18,24,30,36] #beware that S2 data is not daily,
#Initialize data reader for planet images
s2_reader = S2Reader(input_dir=south_africa_s2_train_dir_1,
label_dir=south_africa_tr_labels_dir_1,
selected_time_points=selected_data_indices)
INFO: Found folder in data/sentinel-2/s2-utm-34S-19E-258N-2017, no need to unzip INFO: Ignoring 3/1715 fields with area < 1000m2
INFO: Extracting time series into the folder: data/sentinel-2/s2-utm-34S-19E-258N-2017/time_series: 100%|██████████| 1712/1712 [00:00<00:00, 7886.99it/s]
#DEFINE TRUE COLOR IMAGING AND NDVI INDEXING FUNCTIONS FOR VISUALISATION OF S2 DATA:
#Define NDVI index for S2 images
def ndvi(X):
red = X[3]
nir = X[7]
return (nir-red) / (nir + red)
#Define True Color for S2 images
def true_color(X):
blue = X[1]/(X[1].max()/255.0)
green = X[2]/(X[2].max()/255.0)
red = X[3]/(X[3].max()/255.0)
tc = np.dstack((red,green,blue))
return tc.astype('uint8')
#VISUALISE SOME OF THE FIELDS FROM S2 DATA:
#Initialize plot cells
num_row = 2 * len(selected_days_of_year)
num_col = len(label_ids)
fig, axes = plt.subplots(num_row, num_col, figsize=(2.5*num_col,2*num_row))
#Display one sample field for each crop type
pbar = tqdm(total=len(label_ids))
iterable=iter(s2_reader)
for crop_id, crop_name in zip(label_ids,label_names):
while True:
# read a field sample
X,y,mask,fid = next(iterable)
width=X.shape[-1]
height=X.shape[-2]
#Get one sample for each crop type, and
#consider large areas (at least 60x60) to display
if y == crop_id and width>60 and height>60:
for i, day in enumerate(selected_data_indices):
# Display RGB image of the field in a given week for a given crop type
ax = axes[(2*i)%num_row, crop_id%num_col]
ax.title.set_text('{}'.format(crop_name))
ax.set_ylabel('RGB in Day {}'.format(day))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(true_color(X[i]))
# Display NDVI index of the field in a given day for a given crop type
ax = axes[(2*i+1)%num_row, crop_id%num_col]
ax.title.set_text('{}'.format(crop_name))
ax.set_ylabel('NDVI in Day {}'.format(day))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(ndvi(X[i]*mask),cmap=plt.cm.summer)
#if one sample selected for a crop type, break the WHILE loop
pbar.set_description("Plotting {} Fields".format(crop_name))
pbar.update(1)
break
plt.tight_layout()
plt.show()
pbar.set_description("Plotting Complete!")
pbar.close()
Please note that some of the true color views of agricultural fields observed above have been occluded with high level of clouds. You are free to use cloud probability mask CLP provided with Sentinel-2 as you can see how to reach it in file notebook/utils/sentinel_2_reader.py of this project.
This section offers some tips and pointers on how to possibly transform the data in a ML-ready format and build a sample ML model. Implementations here are mainly based on the approach in following paper:
Kondmann, Lukas, et al. (2021), DENETHOR: The DynamicEarthNET dataset for Harmonized, inter-Operable, analysis-Ready, daily crop monitoring
Please note that the below examples about data reading, data augmentation and ML training are based on standalone usage of each data sources (i.e. exploiting only Planet Fusion or Sentinel-2), but you are free to implement any processing pipeline to utilize the fusion/ensemble approaches on data level, feature level or decision level.
If you want to use eo-learn for these tasks, check out the documentation about existing tasks like filtering and pixel sampling here. You can easily implement your own task (as done above) by following this example. Here you can find a collection of examples including land cover and crop-type classification.
The following example initializes and trains a field-based crop-type classification model utilizing Planet Fusion data over Brandenburg area. However, by changing the type of data reader (i.e. from PlanetReader to S1Reader or S2Reader) and changing the training data directories, you can do the same training for other AOIs with different data sources. The example data readers are given under notebook/utils/ of this project:
#LET'S RECALL NECESSARY DIRECTORIES FOR TRAINING ON BRANDENBURG AREA:
brandenburg_planet_train_dir='data/planet/UTM-24000/33N/18E-242N/PF-SR/'
brandenburg_tr_labels_dir='data/brandenburg-gt/br-18E-242N-crop-labels-train-2018.geojson'
brandenburg_tr_labels=gpd.read_file(brandenburg_tr_labels_dir)
label_ids=brandenburg_tr_labels['crop_id'].unique()
label_names=brandenburg_tr_labels['crop_name'].unique()
#INITIALIZE DATA LOADERS FOR TRAINING AND EVALUATION:
#Get data transformer for planet images
planet_transformer=PlanetTransform()
#Initialize data reader for planet images
planet_reader = PlanetReader(input_dir=brandenburg_planet_train_dir,
label_dir=brandenburg_tr_labels_dir,
label_ids=label_ids,
transform=planet_transformer.transform,
min_area_to_ignore=1000)
#Initialize data loaders
data_loader=DataLoader(train_val_reader=planet_reader, validation_split=0.25)
train_loader=data_loader.get_train_loader(batch_size=8, num_workers=1)
valid_loader=data_loader.get_validation_loader(batch_size=8, num_workers=1)
INFO: Coordinate system of the data is: EPSG:32633 INFO: Ignoring 30/2534 fields with area < 1000m2
INFO: Extracting time series into the folder: /local_home/kuzu_ri/GIT_REPO/starter-pack-ai4food-v0.0.1/notebook/data/planet/UTM-24000/33N/18E-242N/PF-SR/time_series: 100%|██████████| 2504/2504 [00:00<00:00, 4264.66it/s]
INFO: Training data loader initialized. INFO: Validation data loader initialized.
#INITIALIZE TRAINING MODEL:
INPUT_DIM=4 # number of channels in Planet Fusion data
SEQUENCE_LENGTH=365 #Sequence size of Planet Fusion Data
DEVICE='cpu'
START_EPOCH=0
TOTAL_EPOCH=5
model = SpatiotemporalModel(input_dim=INPUT_DIM, num_classes=len(label_ids), sequencelength=SEQUENCE_LENGTH, device=DEVICE)
# OPTIONAL: trying gradient clipping to avoid loss being NaN.
clip_value = 1e-2
for p in model.parameters():
p.register_hook(lambda grad: torch.clamp(grad, -clip_value, clip_value))
INFO: model initialized with name:mobilenet_v3_small_LSTM
#INITIALIZE MODEL OPTIMIZER AND LOSS CRITERION:
#Initialize model optimizer and loss criterion:
optimizer = Adam(model.parameters(), lr=1e-3, weight_decay=1e-6)
loss_criterion = CrossEntropyLoss(reduction="mean")
#SET LOG DIRECTORY FOR THE TRAINING AND VALIDATION OUTPUTS:
log = list()
log_root='temp_planet/'
logdir = os.path.join(log_root, model.modelname)
os.makedirs(logdir, exist_ok=True)
print("INFO: Logging results will be saved to {}".format(logdir))
summarywriter = SummaryWriter(log_dir=logdir)
snapshot_path = os.path.join(logdir, "model.pth.tar")
INFO: Logging results will be saved to temp_planet/mobilenet_v3_small_LSTM
#IF THERE IS ALREADY A TRAINED MODEL, RESUME IT:
snapshot_path = os.path.join(logdir, "model.pth.tar")
if os.path.exists(snapshot_path):
checkpoint = torch.load(snapshot_path)
START_EPOCH = checkpoint["epoch"]
log = checkpoint["log"]
optimizer.load_state_dict(checkpoint["optimizer_state"])
model.load_state_dict(checkpoint["model_state"])
print(f"INFO: Resuming from {snapshot_path}, epoch {START_EPOCH}")
#DEFINE TRAINING AND VALIDATION EPOCH FUNCTIONS:
for epoch in range(START_EPOCH, TOTAL_EPOCH):
train_loss = tveu.train_epoch(model, optimizer, loss_criterion, train_loader, device=DEVICE)
valid_loss, y_true, y_pred, *_ = tveu.validation_epoch(model, loss_criterion, valid_loader, device=DEVICE)
scores = tveu.metrics(y_true.cpu(), y_pred.cpu())
scores_msg = ", ".join([f"{k}={v:.2f}" for (k, v) in scores.items()])
valid_loss = valid_loss.cpu().detach().numpy()[0]
train_loss = train_loss.cpu().detach().numpy()[0]
scores["epoch"] = epoch
scores["train_loss"] = train_loss
scores["valid_loss"] = valid_loss
log.append(scores)
summarywriter.add_scalars("losses", dict(train=train_loss, valid=valid_loss), global_step=epoch)
summarywriter.add_scalars("metrics",
{key: scores[key] for key in
['accuracy', 'kappa', 'f1_micro', 'f1_macro', 'f1_weighted',
'recall_micro','recall_macro', 'recall_weighted',
'precision_micro', 'precision_macro','precision_weighted']},
global_step=epoch)
cm = confusion_matrix(y_true=y_true, y_pred=y_pred.cpu().detach().numpy(), labels=np.arange(len(label_ids)))
summarywriter.add_figure("confusion_matrix",tveu.confusion_matrix_figure(cm, labels=label_ids),global_step=epoch)
log_df = pd.DataFrame(log).set_index("epoch")
log_df.to_csv(os.path.join(logdir, "train_log.csv"))
torch.save(dict( model_state=model.state_dict(),optimizer_state=optimizer.state_dict(), epoch=epoch, log=log),snapshot_path)
if len(log) > 2:
if valid_loss < np.array([l["valid_loss"] for l in log[:-1]]).min():
best_model = snapshot_path.replace("model.pth.tar","model_best.pth.tar")
print(f"INFO: New best model with valid_loss {valid_loss:.2f} at {best_model}")
shutil.copy(snapshot_path, best_model)
print(f"INFO: epoch {epoch}: train_loss {train_loss:.2f}, valid_loss {valid_loss:.2f} " + scores_msg)
train loss=2.57: 100%|██████████| 235/235 [02:13<00:00, 1.77it/s] valid loss=2.24: 100%|██████████| 79/79 [00:19<00:00, 4.09it/s]
INFO: epoch 0: train_loss 2.51, valid_loss 2.21 accuracy=0.13, kappa=0.01, f1_micro=0.13, f1_macro=0.06, f1_weighted=0.08, recall_micro=0.13, recall_macro=0.12, recall_weighted=0.13, precision_micro=0.13, precision_macro=0.12, precision_weighted=0.30
The training steps can be also monitored by using Tensorboard, as shown below:
!tensorboard --logdir logdir
The following example initializes and trains a field-based crop-type classification model utilizing Sentinel 2 data over South-Africa area. However, by changing the type of data reader (i.e. from PlanetReader to S1Reader or S2Reader) and changing the training data directories, you can do the same training for other AOIs with different data sources. The example data readers are given under notebook/utils/ of this project:
#LET'S RECALL NECESSARY DIRECTORIES FOR TRAINING ON A TILE OF SOUTH AFRICA:
south_africa_s2_train_dir_1='data/sentinel-2/s2-utm-34S-19E-258N-2017.zip'
south_africa_tr_labels_dir_1='data/south-africa-gt/sa-19E-258N-crop-labels-train-2017.geojson'
south_africa_s2_train_dir_2='data/sentinel-2/s2-utm-34S-19E-259N-2017.zip'
south_africa_tr_labels_dir_2='data/south-africa-gt/sa-19E-259N-crop-labels-train-2017.geojson'
south_africa_tr_labels_1=gpd.read_file(south_africa_tr_labels_dir_1)
label_ids=south_africa_tr_labels_1['crop_id'].unique()
label_names=south_africa_tr_labels_1['crop_name'].unique()
#INITIALIZE DATA LOADERS FOR TRAINING AND EVALUATION:
#Get data transformer for S2 images
sentinel_2_transformer=Sentinel2Transform()
#Initialize data reader for S2 images
s2_reader = S2Reader(input_dir=south_africa_s2_train_dir_1,
label_dir=south_africa_tr_labels_dir_1,
label_ids=label_ids,
transform=sentinel_2_transformer.transform,
min_area_to_ignore=1000)
#Initialize data loaders
data_loader=DataLoader(train_val_reader=s2_reader, validation_split=0.25)
train_loader=data_loader.get_train_loader(batch_size=8, num_workers=1)
valid_loader=data_loader.get_validation_loader(batch_size=8, num_workers=1)
INFO: Found folder in data/sentinel-2/s2-utm-34S-19E-258N-2017, no need to unzip INFO: Ignoring 3/1715 fields with area < 1000m2
INFO: Extracting time series into the folder: data/sentinel-2/s2-utm-34S-19E-258N-2017/time_series: 100%|██████████| 1712/1712 [00:00<00:00, 8141.26it/s]
INFO: Training data loader initialized. INFO: Validation data loader initialized.
#INITIALIZE TRAINING MODEL:
INPUT_DIM=12 # number of channels in S2 data
SEQUENCE_LENGTH=76 #Sequence size of Planet Fusion Data
DEVICE='cpu'
START_EPOCH=0
TOTAL_EPOCH=1
model = SpatiotemporalModel(input_dim=INPUT_DIM, num_classes=len(label_ids), sequencelength=SEQUENCE_LENGTH, device=DEVICE)
# OPTIONAL: trying gradient clipping to avoid loss being NaN.
clip_value = 1e-2
for p in model.parameters():
p.register_hook(lambda grad: torch.clamp(grad, -clip_value, clip_value))
INFO: model initialized with name:mobilenet_v3_small_LSTM
#INITIALIZE MODEL OPTIMIZER AND LOSS CRITERION:
#Initialize model optimizer and loss criterion:
optimizer = Adam(model.parameters(), lr=1e-3, weight_decay=1e-6)
loss_criterion = CrossEntropyLoss(reduction="mean")
#SET LOG DIRECTORY FOR THE TRAINING AND VALIDATION OUTPUTS:
log = list()
log_root='temp_s2/'
logdir = os.path.join(log_root, model.modelname)
os.makedirs(logdir, exist_ok=True)
print("INFO: Logging results will be saved to {}".format(logdir))
summarywriter = SummaryWriter(log_dir=logdir)
snapshot_path = os.path.join(logdir, "model.pth.tar")
INFO: Logging results will be saved to temp_s2/mobilenet_v3_small_LSTM
#IF THERE IS ALREADY A TRAINED MODEL, RESUME IT:
snapshot_path = os.path.join(logdir, "model.pth.tar")
if os.path.exists(snapshot_path):
checkpoint = torch.load(snapshot_path)
START_EPOCH = checkpoint["epoch"]
log = checkpoint["log"]
optimizer.load_state_dict(checkpoint["optimizer_state"])
model.load_state_dict(checkpoint["model_state"])
print(f"INFO: Resuming from {snapshot_path}, epoch {START_EPOCH}")
#DEFINE TRAINING AND VALIDATION EPOCH FUNCTIONS:
for epoch in range(START_EPOCH, TOTAL_EPOCH):
train_loss = tveu.train_epoch(model, optimizer, loss_criterion, train_loader, device=DEVICE)
valid_loss, y_true, y_pred, *_ = tveu.validation_epoch(model, loss_criterion, valid_loader, device=DEVICE)
scores = tveu.metrics(y_true.cpu(), y_pred.cpu())
scores_msg = ", ".join([f"{k}={v:.2f}" for (k, v) in scores.items()])
valid_loss = valid_loss.cpu().detach().numpy()[0]
train_loss = train_loss.cpu().detach().numpy()[0]
scores["epoch"] = epoch
scores["train_loss"] = train_loss
scores["valid_loss"] = valid_loss
log.append(scores)
summarywriter.add_scalars("losses", dict(train=train_loss, valid=valid_loss), global_step=epoch)
summarywriter.add_scalars("metrics",
{key: scores[key] for key in
['accuracy', 'kappa', 'f1_micro', 'f1_macro', 'f1_weighted',
'recall_micro','recall_macro', 'recall_weighted',
'precision_micro', 'precision_macro','precision_weighted']},
global_step=epoch)
cm = confusion_matrix(y_true=y_true, y_pred=y_pred.cpu().detach().numpy(), labels=np.arange(len(label_ids)))
summarywriter.add_figure("confusion_matrix",tveu.confusion_matrix_figure(cm, labels=label_ids),global_step=epoch)
log_df = pd.DataFrame(log).set_index("epoch")
log_df.to_csv(os.path.join(logdir, "train_log.csv"))
torch.save(dict( model_state=model.state_dict(),optimizer_state=optimizer.state_dict(), epoch=epoch, log=log),snapshot_path)
if len(log) > 2:
if valid_loss < np.array([l["valid_loss"] for l in log[:-1]]).min():
best_model = snapshot_path.replace("model.pth.tar","model_best.pth.tar")
print(f"INFO: New best model with valid_loss {valid_loss:.2f} at {best_model}")
shutil.copy(snapshot_path, best_model)
print(f"INFO: epoch {epoch}: train_loss {train_loss:.2f}, valid_loss {valid_loss:.2f} " + scores_msg)
train loss=1.22: 100%|██████████| 161/161 [02:11<00:00, 1.22it/s] valid loss=1.95: 100%|██████████| 54/54 [00:17<00:00, 3.17it/s]
INFO: epoch 0: train_loss 1.74, valid_loss 1.46 accuracy=0.28, kappa=-0.01, f1_micro=0.28, f1_macro=0.10, f1_weighted=0.14, recall_micro=0.28, recall_macro=0.20, recall_weighted=0.28, precision_micro=0.28, precision_macro=0.12, precision_weighted=0.20
A valid submission entails generating a JSON file containing four columns, named fid, crop_id, crop_name and crop_prop standing for field ID, predicted crop ID, predicted crop name and the softmax probabilities as an array in the size of number of classes. For instance:
fid: 2685 -> INTEGER crop_id: 1 -> INTEGERcrop_name: Wheat -> STRINGcrop_prop: [0.30404267, 0.22208424, 0.10115303, 0.22929858, 0.14342153] -> FLOAT ARRAY OF CLASS PROBABILITIEScomposes a row of a prediction for a field in South Africa, where 5 different classes are provided as:
The following code blocks show how to generate a valid submission using a crop type prediction model:
Recall test data directory and test field polygons in order to initialize the data reader. In this example, we are generating a submission file for South Africa:
#DIRECTORIES FOR TEST DATA:
south_africa_s2_test_dir='data/sentinel-2/s2-utm-34S-20E-259N-2017.zip'
south_africa_te_labels_dir='data/south-africa-gt/sa-20E-259N-crop-labels-test-2017.geojson'
#INITIALIZE TEST DATA READER FOR S2 IMAGES:
s2_test_reader = S2Reader(input_dir=south_africa_s2_test_dir,
label_dir=south_africa_te_labels_dir,
transform=Sentinel2Transform().transform,
min_area_to_ignore=0)
INFO: Found folder in data/sentinel-2/s2-utm-34S-20E-259N-2017, no need to unzip INFO: Ignoring 0/2417 fields with area < 0m2
INFO: Extracting time series into the folder: data/sentinel-2/s2-utm-34S-20E-259N-2017/time_series: 100%|██████████| 2417/2417 [00:41<00:00, 58.57it/s]
#SET THE OUTPUT DIRECTORY FOR THE JSON SUBMISSION FILE
output_name = logdir = os.path.join(log_root, '34S-20E-259N-2017-submission.json')
# RESUME BEST MODEL FOR MAKING PREDICTIONS ON THE TEST DATA:
best_model = os.path.join(logdir, "model_best.pth.tar")
if os.path.exists(best_model):
checkpoint = torch.load(best_model)
START_EPOCH = checkpoint["epoch"]
log = checkpoint["log"]
optimizer.load_state_dict(checkpoint["optimizer_state"])
model.load_state_dict(checkpoint["model_state"])
print(f"INFO: Resuming from {best_model}, epoch {START_EPOCH}")
# FILL A LIST OF DICTIONARY WITH THE PREDICTIONS:
output_list=[]
softmax=torch.nn.Softmax()
for X,_,_,fid in tqdm(s2_test_reader, total=len(s2_test_reader), position=0, leave=True,
desc="INFO: Saving predictions for each field into a dictionary:"):
logits = model(X.unsqueeze(0))
predicted_probabilities = softmax(logits).detach().numpy()[0]
predicted_class = np.argmax(predicted_probabilities)
output_list.append({'fid': fid,
'crop_id': label_ids[predicted_class],
'crop_name': label_names[predicted_class],
'crop_probs': predicted_probabilities[label_ids-1]})
# SAVE THE PREDICTIONS INTO THE OUTPUT JSON:
output_frame = pd.DataFrame.from_dict(output_list)
output_frame.to_json(output_name)
print('Submission was saved to location: {}'.format(output_name))
Submission was saved to location: temp_s2/34S-20E-259N-2017-submission.json
# CHECK IF OUTPUT JSON FILE CONTAINS A PREDICTION FOR EACH CROP FIELD:
south_africa_te_labels=gpd.read_file(south_africa_te_labels_dir)
assert len(output_frame) == len(south_africa_te_labels), "WARNING: The size of the predictions is not equal to number of test fields. " \
"Number of test fields: {}, Number of predictions: {}".format(len(output_frame),len(south_africa_te_labels))
# VISUALISE THE PREDICTION FILE:
output_frame.tail()
| fid | crop_id | crop_name | crop_probs | |
|---|---|---|---|---|
| 2412 | 272354 | 2 | Barley | [0.24382672, 0.39065287, 0.132137, 0.06687267,... |
| 2413 | 272355 | 3 | Canola | [0.14091946, 0.2702931, 0.40873554, 0.06002968... |
| 2414 | 272429 | 3 | Canola | [0.22041656, 0.23963875, 0.25817445, 0.1381026... |
| 2415 | 272640 | 2 | Barley | [0.23871893, 0.3895439, 0.12371767, 0.09559201... |
| 2416 | 272740 | 3 | Canola | [0.14856979, 0.24840996, 0.35579813, 0.1478064... |
NOTE: you can now send the submission for evaluation.